Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence
نویسندگان
چکیده
Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multiobjective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective’s estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique’s decisions, yielding insights into the nature of the problems
منابع مشابه
Multi-Objectivization in Reinforcement Learning
Multi-objectivization is the process of transforming a single objective problem into a multi-objective problem. Research in evolutionary optimization has demonstrated that the addition of objectives that are correlated with the original objective can make the resulting problem easier to solve compared to the original single-objective problem. In this paper we investigate the multi-objectivizati...
متن کاملDynamic shaping of dopamine signals during probabilistic Pavlovian conditioning.
Cue- and reward-evoked phasic dopamine activity during Pavlovian and operant conditioning paradigms is well correlated with reward-prediction errors from formal reinforcement learning models, which feature teaching signals in the form of discrepancies between actual and expected reward outcomes. Additionally, in learning tasks where conditioned cues probabilistically predict rewards, dopamine n...
متن کاملCombining manual feedback with subsequent MDP reward signals for reinforcement learning
As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that...
متن کاملA Robust Desirability-based Approach to Optimizing Multiple Correlated Responses
There are many real problems in which multiple responses should be optimized simultaneously by setting of process variables. One of the common approaches for optimization of multi-response problems is desirability function. In most real cases, there is a correlation structure between responses so ignoring the correlation may lead to mistake results. Hence, in this paper a robust approach based ...
متن کاملReward modulates adaptations to conflict.
Both cognitive conflict (e.g. Verguts & Notebaert, 2009) and reward signals (e.g. Waszak & Pholulamdeth, 2009) have been proposed to enhance task-relevant associations. Bringing these two notions together, we predicted that reward modulates conflict-based sequential adaptations in cognitive control. This was tested combining either a single flanker task (Experiment 1) or a task-switch paradigm ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014